43 research outputs found
Refined Complexity of PCA with Outliers
Principal component analysis (PCA) is one of the most fundamental procedures
in exploratory data analysis and is the basic step in applications ranging from
quantitative finance and bioinformatics to image analysis and neuroscience.
However, it is well-documented that the applicability of PCA in many real
scenarios could be constrained by an "immune deficiency" to outliers such as
corrupted observations. We consider the following algorithmic question about
the PCA with outliers. For a set of points in , how to
learn a subset of points, say 1% of the total number of points, such that the
remaining part of the points is best fit into some unknown -dimensional
subspace? We provide a rigorous algorithmic analysis of the problem. We show
that the problem is solvable in time . In particular, for constant
dimension the problem is solvable in polynomial time. We complement the
algorithmic result by the lower bound, showing that unless Exponential Time
Hypothesis fails, in time , for any function of , it is
impossible not only to solve the problem exactly but even to approximate it
within a constant factor.Comment: To be presented at ICML 201
Parameterized complexity of PCA
We discuss some recent progress in the study of Principal Component Analysis (PCA) from the perspective of Parameterized Complexity.publishedVersio
Consistency-Checking Problems: A Gateway to Parameterized Sample Complexity
Recently, Brand, Ganian and Simonov introduced a parameterized refinement of
the classical PAC-learning sample complexity framework. A crucial outcome of
their investigation is that for a very wide range of learning problems, there
is a direct and provable correspondence between fixed-parameter
PAC-learnability (in the sample complexity setting) and the fixed-parameter
tractability of a corresponding "consistency checking" search problem (in the
setting of computational complexity). The latter can be seen as generalizations
of classical search problems where instead of receiving a single instance, one
receives multiple yes- and no-examples and is tasked with finding a solution
which is consistent with the provided examples.
Apart from a few initial results, consistency checking problems are almost
entirely unexplored from a parameterized complexity perspective. In this
article, we provide an overview of these problems and their connection to
parameterized sample complexity, with the primary aim of facilitating further
research in this direction. Afterwards, we establish the fixed-parameter
(in)-tractability for some of the arguably most natural consistency checking
problems on graphs, and show that their complexity-theoretic behavior is
surprisingly very different from that of classical decision problems. Our new
results cover consistency checking variants of problems as diverse as (k-)Path,
Matching, 2-Coloring, Independent Set and Dominating Set, among others
Parameterized k-Clustering: Tractability Island
In k-Clustering we are given a multiset of n vectors X subset Z^d and a nonnegative number D, and we need to decide whether X can be partitioned into k clusters C_1, ..., C_k such that the cost sum_{i=1}^k min_{c_i in R^d} sum_{x in C_i} |x-c_i|_p^p <= D, where |*|_p is the Minkowski (L_p) norm of order p. For p=1, k-Clustering is the well-known k-Median. For p=2, the case of the Euclidean distance, k-Clustering is k-Means. We study k-Clustering from the perspective of parameterized complexity. The problem is known to be NP-hard for k=2 and it is also NP-hard for d=2. It is a long-standing open question, whether the problem is fixed-parameter tractable (FPT) for the combined parameter d+k. In this paper, we focus on the parameterization by D. We complement the known negative results by showing that for p=0 and p=infty, k-Clustering is W1-hard when parameterized by D. Interestingly, the complexity landscape of the problem appears to be more intricate than expected. We discover a tractability island of k-Clustering: for every p in (0,1], k-Clustering is solvable in time 2^O(D log D) (nd)^O(1).publishedVersio
On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications
Fair clustering is a constrained clustering problem where we need to partition a set of colored points. The fraction of points of each color in every cluster should be more or less equal to the fraction of points of this color in the dataset. The problem was recently introduced by Chierichetti et al. (2017) [1]. We propose a new construction of coresets for fair clustering for Euclidean and general metrics based on random sampling. For the Euclidean space Rd, we provide the first coreset whose size does not depend exponentially on the dimension d. The question of whether such constructions exist was asked by Schmidt et al. (2019) [2]and Huang et al. (2019) [5]. For general metrics, our construction provides the first coreset for fair clustering. New coresets appear to be a handy tool for designing better approximation and streaming algorithms for fair and other constrained clustering variants
Building Large k-Cores from Sparse Graphs
A popular model to measure network stability is the k-core, that is the maximal induced subgraph in which every vertex has degree at least k. For example, k-cores are commonly used to model the unraveling phenomena in social networks. In this model, users having less than k connections within the network leave it, so the remaining users form exactly the k-core. In this paper we study the question of whether it is possible to make the network more robust by spending only a limited amount of resources on new connections. A mathematical model for the k-core construction problem is the following Edge k-Core optimization problem. We are given a graph G and integers k, b and p. The task is to ensure that the k-core of G has at least p vertices by adding at most b edges.
The previous studies on Edge k-Core demonstrate that the problem is computationally challenging. In particular, it is NP-hard when k = 3, W[1]-hard when parameterized by k+b+p (Chitnis and Talmon, 2018), and APX-hard (Zhou et al, 2019). Nevertheless, we show that there are efficient algorithms with provable guarantee when the k-core has to be constructed from a sparse graph with some additional structural properties. Our results are
- When the input graph is a forest, Edge k-Core is solvable in polynomial time;
- Edge k-Core is fixed-parameter tractable (FPT) when parameterized by the minimum size of a vertex cover in the input graph. On the other hand, with such parameterization, the problem does not admit a polynomial kernel subject to a widely-believed assumption from complexity theory;
- Edge k-Core is FPT parameterized by the treewidth of the graph plus k. This improves upon a result of Chitnis and Talmon by not requiring b to be small. Each of our algorithms is built upon a new graph-theoretical result interesting in its own
On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications
Fair clustering is a constrained variant of clustering where the goal is to
partition a set of colored points, such that the fraction of points of any
color in every cluster is more or less equal to the fraction of points of this
color in the dataset. This variant was recently introduced by Chierichetti et
al. [NeurIPS, 2017] in a seminal work and became widely popular in the
clustering literature. In this paper, we propose a new construction of coresets
for fair clustering based on random sampling. The new construction allows us to
obtain the first coreset for fair clustering in general metric spaces. For
Euclidean spaces, we obtain the first coreset whose size does not depend
exponentially on the dimension. Our coreset results solve open questions
proposed by Schmidt et al. [WAOA, 2019] and Huang et al. [NeurIPS, 2019]. The
new coreset construction helps to design several new approximation and
streaming algorithms. In particular, we obtain the first true
constant-approximation algorithm for metric fair clustering, whose running time
is fixed-parameter tractable (FPT). In the Euclidean case, we derive the first
-approximation algorithm for fair clustering whose time
complexity is near-linear and does not depend exponentially on the dimension of
the space. Besides, our coreset construction scheme is fairly general and gives
rise to coresets for a wide range of constrained clustering problems. This
leads to improved constant-approximations for these problems in general metrics
and near-linear time -approximations in the Euclidean metric
Parameterized k-Clustering: Tractability Island
In k-Clustering we are given a multiset of n vectors X subset Z^d and a nonnegative number D, and we need to decide whether X can be partitioned into k clusters C_1, ..., C_k such that the cost sum_{i=1}^k min_{c_i in R^d} sum_{x in C_i} |x-c_i|_p^p <= D, where |*|_p is the Minkowski (L_p) norm of order p. For p=1, k-Clustering is the well-known k-Median. For p=2, the case of the Euclidean distance, k-Clustering is k-Means. We study k-Clustering from the perspective of parameterized complexity. The problem is known to be NP-hard for k=2 and it is also NP-hard for d=2. It is a long-standing open question, whether the problem is fixed-parameter tractable (FPT) for the combined parameter d+k. In this paper, we focus on the parameterization by D. We complement the known negative results by showing that for p=0 and p=infty, k-Clustering is W1-hard when parameterized by D. Interestingly, the complexity landscape of the problem appears to be more intricate than expected. We discover a tractability island of k-Clustering: for every p in (0,1], k-Clustering is solvable in time 2^O(D log D) (nd)^O(1)
Manipulating Districts to Win Elections: Fine-Grained Complexity
Gerrymandering is a practice of manipulating district boundaries and
locations in order to achieve a political advantage for a particular party.
Lewenberg, Lev, and Rosenschein [AAMAS 2017] initiated the algorithmic study of
a geographically-based manipulation problem, where voters must vote at the
ballot box closest to them. In this variant of gerrymandering, for a given set
of possible locations of ballot boxes and known political preferences of
voters, the task is to identify locations for boxes out of possible
locations to guarantee victory of a certain party in at least districts.
Here integers and are some selected parameter.
It is known that the problem is NP-complete already for 4 political parties
and prior to our work only heuristic algorithms for this problem were
developed. We initiate the rigorous study of the gerrymandering problem from
the perspectives of parameterized and fine-grained complexity and provide
asymptotically matching lower and upper bounds on its computational complexity.
We prove that the problem is W[1]-hard parameterized by and that it does
not admit an algorithm for any function of
and only, unless Exponential Time Hypothesis (ETH) fails. Our lower
bounds hold already for parties. On the other hand, we give an algorithm
that solves the problem for a constant number of parties in time
.Comment: Presented at AAAI-2